Large-scale Cloze Test Dataset Designed by Teachers
نویسندگان
چکیده
Cloze test is widely adopted in language exams to evaluate students’ language proficiency. In this paper, we propose the first large-scale human-designed cloze test dataset CLOTH 1, in which the questions were used in middle-school and high-school language exams. With the missing blanks carefully created by teachers and candidate choices purposely designed to be confusing, CLOTH requires a deeper language understanding and a wider attention span than previous automatically generated cloze datasets. We show humans outperform dedicated designed baseline models by a significant margin, even when the model is trained on sufficiently large external data. We investigate the source of the performance gap, trace model deficiencies to some distinct properties of CLOTH, and identify the limited ability of comprehending a long-term context to be the key bottleneck. In addition, we find that human-designed data leads to a larger gap between the model’s performance and human performance when compared to automatically generated data.
منابع مشابه
Dataset for the First Evaluation on Chinese Machine Reading Comprehension
Machine Reading Comprehension (MRC) has become enormously popular recently and has attracted a lot of attentions. However, existing reading comprehension datasets are mostly in English. To add diversity in reading comprehension datasets, in this paper we propose a new Chinese reading comprehension dataset for accelerating related research in the community. The proposed dataset contains two diff...
متن کاملWho did What: A Large-Scale Person-Centered Cloze Dataset
We have constructed a new “Who-did-What” dataset of over 200,000 fill-in-the-gap (cloze) multiple choice reading comprehension problems constructed from the LDC English Gigaword newswire corpus. The WDW dataset has a variety of novel features. First, in contrast with the CNN and Daily Mail datasets (Hermann et al., 2015) we avoid using article summaries for question formation. Instead, each pro...
متن کاملQuasar: Datasets for Question Answering by Search and Reading
We present two new large-scale datasets aimed at evaluating systems designed to comprehend a natural language query and extract its answer from a large corpus of text. The QUASAR-S dataset consists of 37000 cloze-style (fill-in-the-gap) queries constructed from definitions of software entity tags on the popular website Stack Overflow. The posts and comments on the website serve as the backgroun...
متن کاملA Selection Strategy to Improve Cloze Question Quality
We present a strategy to improve the quality of automatically generated cloze and open cloze questions which are used by the REAP tutoring system for assessment in the ill-defined domain of English as a Second Language vocabulary learning. Cloze and open cloze questions are fill-in-the-blank questions with and without multiple choice, respectively. The REAP intelligent tutoring system [1] uses ...
متن کاملImproving Cloze Test Performance of Language Learners Using Web N-Grams
We study the effectiveness of search engines for common usage, a new category of search engines that exploit n-gram frequencies on the web to measure the commonness of a formulation, and that allow their users to submit wildcard queries about formulation uncertainties often encountered in the process of writing. These search engines help to resolve questions on common prepositions following ver...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1711.03225 شماره
صفحات -
تاریخ انتشار 2017